AITopics

Country: North America > Canada (0.04)

Genre: Workflow (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)

Neural Information Processing SystemsOct-2-2025, 13:58:05 GMT

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

When the index of the agent's last visited state embedding in the demonstration

agent, artificial intelligence, machine learning, (18 more...)

Genre: Workflow (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)

Neural Information Processing SystemsSep-25-2025, 23:08:22 GMT

30754e5f4cd69d64b5527cdd87d3cf62-Paper-Conference.pdf

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Country: North America > Canada (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Energy > Oil & Gas > Upstream (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsAug-17-2025, 02:58:47 GMT

First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs Ben Norman

The objective is to maximize the total reward accumulated over all episodes (e.g., the number of games won), expressed as

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Country: North America > Canada (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Energy > Oil & Gas > Upstream (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJan-7-2025

HIVEX: A High-Impact Environment Suite for Multi-Agent Research (extended version)

Siedler, Philipp D.

Games have been vital test beds for the rapid development of Agent-based research. Remarkable progress has been achieved in the past, but it is unclear if the findings equip for real-world problems. While pressure grows, some of the most critical ecological challenges can find mitigation and prevention solutions through technology and its applications. Most real-world domains include multi-agent scenarios and require machine-machine and human-machine collaboration. Open-source environments have not advanced and are often toy scenarios, too abstract or not suitable for multi-agent research. By mimicking real-world problems and increasing the complexity of environments, we hope to advance state-of-the-art multi-agent research and inspire researchers to work on immediate real-world problems. Here, we present HIVEX, an environment suite to benchmark multi-agent research focusing on ecological challenges. HIVEX includes the following environments: Wind Farm Control, Wildfire Resource Management, Drone-Based Reforestation, Ocean Plastic Collection, and Aerial Wildfire Suppression. We provide environments, training examples, and baselines for the main and sub-tasks. All trained models resulting from the experiments of this work are hosted on Hugging Face. We also provide a leaderboard on Hugging Face and encourage the community to submit models trained on our environment suite.

artificial intelligence, machine learning, training training training 1, (15 more...)

2501.0418

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Workflow (1.00)
Research Report (0.81)

Industry:

Energy > Renewable > Wind (1.00)
Government (0.92)
Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

arXiv.org Artificial IntelligenceDec-14-2023

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Li, Hao, Yang, Xue, Wang, Zhaokai, Zhu, Xizhou, Zhou, Jie, Qiao, Yu, Wang, Xiaogang, Li, Hongsheng, Lu, Lewei, Dai, Jifeng

Traditional reinforcement-learning-based agents rely on sparse rewards that often only use binary values to indicate task completion or failure. The challenge in exploration efficiency makes it difficult to effectively learn complex tasks in Minecraft. To address this, this paper introduces an advanced learning system, named Auto MC-Reward, that leverages Large Language Models (LLMs) to automatically design dense reward functions, thereby enhancing the learning efficiency. Auto MC-Reward consists of three important components: Reward Designer, Reward Critic, and Trajectory Analyzer. Given the environment information and task descriptions, the Reward Designer first design the reward function by coding an executable Python function with predefined observation inputs. Then, our Reward Critic will be responsible for verifying the code, checking whether the code is self-consistent and free of syntax and semantic errors. Further, the Trajectory Analyzer summarizes possible failure causes and provides refinement suggestions according to collected trajectories. In the next round, Reward Designer will take further refine and iterate the dense reward function based on feedback. Experiments demonstrate a significant improvement in the success rate and learning efficiency of our agents in complex tasks in Minecraft, such as obtaining diamond with the efficient ability to avoid lava, and efficiently explore trees and animals that are sparse on the plains biome.

agent, health, reward function, (15 more...)

2312.09238

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Asia > China > Hong Kong (0.04)

Genre: Workflow (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-13-2023

Experiential Explanations for Reinforcement Learning

Alabdulkarim, Amal, Singh, Madhuri, Mansi, Gennie, Hall, Kaely, Riedl, Mark O.

Reinforcement Learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen because of future rewards. However, RL agents discard the qualitative features of their training, making it difficult to recover user-understandable information for "why" an action is chosen. We propose a technique, Experiential Explanations, to generate counterfactual explanations by training influence predictors along with the RL policy. Influence predictors are models that learn how sources of reward affect the agent in different states, thus restoring information about how the policy reflects the environment. A human evaluation study revealed that participants presented with experiential explanations were better able to correctly guess what an agent would do than those presented with other standard types of explanation. Participants also found that experiential explanations are more understandable, satisfying, complete, useful, and accurate. The qualitative analysis provides insights into the factors of experiential explanations that are most useful.

agent, explanation, influence predictor, (12 more...)

2210.04723

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Genre:

Questionnaire & Opinion Survey (0.94)
Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Bates, Elizabeth, Mavroudis, Vasilios, Hicks, Chris

Reward Shaping for Happier Autonomous Cyber Security Agents

arXiv.org Artificial IntelligenceOct-20-2023

As machine learning models become more capable, they have exhibited increased potential in solving complex tasks. One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task. Due to the nature of cybersecurity tasks, the reward signal is typically 1) in the form of penalties (e.g., when a compromise occurs), and 2) distributed sparsely across each defense episode. Such reward characteristics are atypical of classic reinforcement learning tasks where the agent is regularly rewarded for progress (cf. to getting occasionally penalized for failures). We investigate reward shaping techniques that could bridge this gap so as to enable agents to train more sample-efficiently and potentially converge to a better performance. We first show that deep reinforcement learning algorithms are sensitive to the magnitude of the penalties and their relative size. Then, we combine penalties with positive external rewards and study their effect compared to penalty-only training. Finally, we evaluate intrinsic curiosity as an internal positive reward mechanism and discuss why it might not be as advantageous for high-level network monitoring tasks.

agent, baseline, experiment, (14 more...)

2310.13565

Country:

Europe > United Kingdom (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Bar-On, Yogev, Mansour, Yishay

Uniswap Liquidity Provision: An Online Learning Approach

arXiv.org Artificial IntelligenceFeb-6-2023

Decentralized Exchanges (DEXs) are new types of marketplaces leveraging Blockchain technology. They allow users to trade assets with Automatic Market Makers (AMM), using funds provided by liquidity providers, removing the need for order books. One such DEX, Uniswap v3, allows liquidity providers to allocate funds more efficiently by specifying an active price interval for their funds. This introduces the problem of finding an optimal strategy for choosing price intervals. We formalize this problem as an online learning problem with non-stochastic rewards. We use regret-minimization methods to show a liquidity provision strategy that guarantees a lower bound on the reward. This is true even for non-stochastic changes to asset pricing, and we express this bound in terms of the trading volume.

artificial intelligence, liquidity provider, machine learning, (14 more...)

2302.0061

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry:

Banking & Finance > Trading (1.00)
Education > Educational Setting > Online (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.63)

Kumar, Satyam, Vishal, Mendhikar, Ravi, Vadlamani

Explainable Reinforcement Learning on Financial Stock Trading using SHAP

arXiv.org Artificial IntelligenceAug-18-2022

Explainable Artificial Intelligence (XAI) research gained prominence in recent years in response to the demand for greater transparency and trust in AI from the user communities. This is especially critical because AI is adopted in sensitive fields such as finance, medicine etc., where implications for society, ethics, and safety are immense. Following thorough systematic evaluations, work in XAI has primarily focused on Machine Learning (ML) for categorization, decision, or action. To the best of our knowledge, no work is reported that offers an Explainable Reinforcement Learning (XRL) method for trading financial stocks. In this paper, we proposed to employ SHapley Additive exPlanation (SHAP) on a popular deep reinforcement learning architecture viz., deep Q network (DQN) to explain an action of an agent at a given instance in financial stock trading. To demonstrate the effectiveness of our method, we tested it on two popular datasets namely, SENSEX and DJIA, and reported the results.

explanation, prediction, reward prediction, (13 more...)

2208.0879

Country:

Asia > India (0.15)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.51)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Banking & Finance > Trading (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)